Statistical Sandhi Splitter and its Effect on NLP Applications

نویسندگان

  • Prathyusha Kuncham
  • Kovida Nelakuditi
  • Radhika Mamidi
چکیده

This paper revisits the work of (Kuncham et al., 2015) which developed a statistical sandhi splitter (SSS) for agglutinative languages that was tested for Telugu and Malayalam languages. Handling compound words is a major challenge for Natural Language Processing (NLP) applications for agglutinative languages. Hence, in this paper we concentrate on testing the effect of SSS on the NLP applications like Machine Translation, Dialogue System and Anaphora Resolution and show that the accuracy of these applications is consistently improved by using SSS. We shall also discuss in detail the performance of SSS on these applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Sandhi Splitter for Agglutinative Languages

Sandhi splitting is a primary and an important step for any natural language processing (NLP) application for languages which have agglutinative morphology. This paper presents a statistical approach to build a sandhi splitter for agglutinative languages. The input to the model is a valid string in the language and the output is a split of that string into meaningful word/s. The approach adopte...

متن کامل

Significance of an Accurate Sandhi-Splitter in Shallow Parsing of Dravidian Languages

This paper evaluates the challenges involved in shallow parsing of Dravidian languages which are highly agglutinative and morphologically rich. Text processing tasks in these languages are not trivial because multiple words concatenate to form a single string with morpho-phonemic changes at the point of concatenation. This phenomenon known as Sandhi, in turn complicates the individual word iden...

متن کامل

A Sandhi Splitter for Malayalam

Sandhi splitting is the primary task for computational processing of text in Sanskrit and Dravidian languages. In these languages, words can join together with morpho-phonemic changes at the point of joining. This phenomenon is known as Sandhi. Sandhi splitter splits the string of conjoined words into individual words. Accurate execution of sandhi splitting is crucial for text processing tasks ...

متن کامل

External Sandhi and its Relevance to Syntactic Treebanking

External sandhi is a linguistic phenomenon which refers to a set of sound changes that occur at word boundaries. These changes are similar to phonological processes such as assimilation and fusion when they apply at the level of prosody, such as in connected speech. External sandhi formation can be orthographically reflected in some languages. External sandhi formation in such languages, causes...

متن کامل

The Effect of Square Splittered and Unsplittered Rods in Flat Plate Heat Transfer Enhancement

A square splittered and unsplittered rod is placed in a turbulent boundary layer developed over a flat plate. The effect of the resulting disturbances on the local heat transfer coefficient is then studied. In both cases the square rod modifies the flow structure inside the boundary layer. As a result, a stagnation point, a jet and wake area are generated around the square rod, each making a co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015